Bag-of-Vector Embeddings of Dependency Graphs for Semantic Induction
نویسندگان
چکیده
Vector-space models, from word embeddings to neural network parsers, have many advantages for NLP. But how to generalise from fixed-length word vectors to a vector space for arbitrary linguistic structures is still unclear. In this paper we propose bag-of-vector embeddings of arbitrary linguistic graphs. A bag-of-vector space is the minimal nonparametric extension of a vector space, allowing the representation to grow with the size of the graph, but not tying the representation to any specific tree or graph structure. We propose efficient training and inference algorithms based on tensor factorisation for embedding arbitrary graphs in a bag-ofvector space. We demonstrate the usefulness of this representation by training bag-of-vector embeddings of dependency graphs and evaluating them on unsupervised semantic induction for the Semantic Textual Similarity and Natural Language Inference tasks.
منابع مشابه
Dependency Based Embeddings for Sentence Classification Tasks
We compare different word embeddings from a standard window based skipgram model, a skipgram model trained using dependency context features and a novel skipgram variant that utilizes additional information from dependency graphs. We explore the effectiveness of the different types of word embeddings for word similarity and sentence classification tasks. We consider three common sentence classi...
متن کاملLabeling Subgraph Embeddings and Cordiality of Graphs
Let $G$ be a graph with vertex set $V(G)$ and edge set $E(G)$, a vertex labeling $f : V(G)rightarrow mathbb{Z}_2$ induces an edge labeling $ f^{+} : E(G)rightarrow mathbb{Z}_2$ defined by $f^{+}(xy) = f(x) + f(y)$, for each edge $ xyin E(G)$. For each $i in mathbb{Z}_2$, let $ v_{f}(i)=|{u in V(G) : f(u) = i}|$ and $e_{f^+}(i)=|{xyin E(G) : f^{+}(xy) = i}|$. A vertex labeling $f$ of a graph $G...
متن کاملCentroid-based Text Summarization through Compositionality of Word Embeddings
The textual similarity is a crucial aspect for many extractive text summarization methods. A bag-of-words representation does not allow to grasp the semantic relationships between concepts when comparing strongly related sentences with no words in common. To overcome this issue, in this paper we propose a centroidbased method for text summarization that exploits the compositional capabilities o...
متن کاملJU-Evora: A Graph Based Cross-Level Semantic Similarity Analysis using Discourse Information
Text Analytics using semantic information is the latest trend of research due to its potential to represent better the texts content compared with the bag-of-words approaches. On the contrary, representation of semantics through graphs has several advantages over the traditional representation of feature vector. Therefore, error tolerant graph matching techniques can be used for text comparison...
متن کاملSpeech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech
In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar. The...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.00205 شماره
صفحات -
تاریخ انتشار 2017